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ABSTRACT 

During the process of prokaryotic CRISPR adapta- 
tion, a copy of a segment of foreign deoxyribonu- 
cleic acid referred to as protospacer is added to the 
CRISPR cassette and becomes a spacer. When a pro- 
tospacer contains a neighboring target interference 
motif, the specific small CRISPR ribonucleic acid (cr- 
RNA) transcribed from expanded CRISPR cassette 
can protect a prokaryotic cell from virus infection or 
plasmid transformation and conjugation. We show 
that in Escherichia coli, a vast majority of plasmid 
protospacers generate spacers integrated in CRISPR 
cassette in two opposing orientations, leading to fre- 
quent appearance of complementary spacer pairs in 
a population of cells that underwent CRISPR adap- 
tation. When a protospacer contains a spacer acqui- 
sition motif AAG, spacer orientation that generates 
functional protective crRNA is strongly preferred. 
All other protospacers give rise to spacers oriented 
in both ways at comparable frequencies. This phe- 
nomenon increases the repertoire of available spac- 
ers and should make it more likely that a protective 
crRNA is formed as a result of CRISPR adaptation. 

INTRODUCTION 

CRISPR-Cas are diverse small RNA-based adaptive immu- 
nity systems of prokaryotes. A typical CRISPR-Cas sys- 
tem consists of a CRISPR (Clusters of Regularly Inter- 
spaced Short Palindromic Repeats) cassette, a deoxyribonu- 
cleic acid (DNA) locus consisting of number of identical re- 
peated sequences separated by variable spacers (1-3) and 
several cas genes (4,5). The non-coding CRISPR cassette 



transcript (pre -crRNA) is processed by introduction of en- 
donucleolytic cuts inside the repeat sequences (6,7). As a 
result, a set of crRNAs is produced each containing iden- 
tical flanking sequences originating from repeats and vari- 
able internal sequences corresponding to CRISPR spac- 
ers (6,8). Individual crRNAs associate with Cas proteins 
and guide them to double-stranded DNA (and in some 
cases, RNA (9,10)) targets matching the spacer sequence 
(6,8,11). Such sequences are referred to as protospacers 
(12). When a DNA protospacer fully matches a crRNA 
spacer and contains an additional short motif (13,14) re- 
ferred to as target interference motif (TIM) (15), a R-loop 
at the site of recognition is formed and the target DNA is 
destroyed (16,17). Thus, when CRISPR spacers match func- 
tional TIM-associated sequences in mobile genetic elements 
such as bacteriophages or plasmids, the cell is able to purge 
such elements, protecting itself from bacteriophage infec- 
tion or plasmid conjugation/transformation (3,16,18-22). 
This protective function is referred to as CRISPR interfer- 
ence (19). 

A set of spacers present in a CRISPR cassette deter- 
mines the ability of a CRISPR-Cas system to recognize and 
protect the host prokaryote from mobile genetic elements 
(12,14,18). Spacers are acquired in a pseudo-Lamarckian 
process from bacteriophage or plasmid DNA. This poorly 
understood process is referred to as CRISPR adaptation 
(3,16,18-21). The two most conserved Cas proteins, Casl 
and Cas2, are dispensable for CRISPR interference (6) but 
essential for CRISPR adaptation (23). For the adaptation 
process to lead to subsequent interference, the adaptation 
machinery must insert into the CRISPR cassette of the host 
foreign DNA fragments associated with TIM sequences 
that can promote subsequent interference. In addition, dif- 
ferentiation between self and non-self DNA must be some- 
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how accomplished, since acquisition of spacer from self- 
DNA would lead to an autoimmune response (24-26). 

Escherichia coli Casl and Cas2 alone, when overex- 
pressed in the absence of other Cas proteins or crRNA, are 
capable of causing new spacer acquisition and analysis of 
protospacers shows that many are preceded with an AWG 
motif (23) referred to as spacer acquisition motif (SAM) 
(15). Four functional TIMs, ATG, AAG, AGO and GAG 
are recognized by the E. coli Csel protein, a part of a large 
interfering Cascade-crRNA complex, during CRISPR in- 
terference (17). Thus, the AAG and ATG SAMs are also 
functional TIMs (17), which ensures that many adaptation 
events lead to interference-capable spacers. In addition to 
SAM, an 'acquisition affecting motif AAM has been re- 
ported within some protospacers which are highly preferred 
substrates of acquisition (27). 

In addition to the Casl/Cas2 dependent, Cascade- 
crRNA-independent adaptation process (referred to as 'un- 
primed'), a much more efficient 'primed adaptation' has 
been described (28-30). This process requires not just Casl 
and Cas2 but also Cascade and is activated by the pres- 
ence of target DNA that contains point substitutions in the 
TIM or the protospacer that render crRNA recognition and 
CRISPR interference inefficient (28). Under these condi- 
tions, the residual 'priming' interaction of Cascade-crRNA 
complex with mutated target strongly stimulates acquisition 
of spacers from protospacers located in cis with respect to 
the priming protospacer. Primed spacer acquisition is char- 
acterized by a very strong preference for protospacers with 
an AAG SAM (28,31) and is obviously highly adaptive, as 
it allows to specifically target foreign DNA that had 'es- 
caped' the previous line of CRISPR-Cas defense by acquir- 
ing point mutations in the protospacer or TIM. 

The enzymology of CRISPR adaptation is currently un- 
known. Yet, it is clear that Casl, Cas2 (23,28) or both of 
these proteins (32), possibly along with some yet-to-be iden- 
tified host factors (18), must recognize a donor DNA frag- 
ment and then initiate a sequence of events that leads either 
to copying or excision and physical transfer of this frag- 
ment into recipient CRISPR cassette. As a result, the cas- 
sette is expanded by the addition of a new spacer and an 
extra repeat copy, which must arise through replication of 
a preexisting repeat (23,28,29,33). Here, we analyze multi- 
ple spacer acquisition events during primed and unprimed 
CRISPR adaptation in E. coli. We observe that for a given 
protospacer, acquisition events lead to both possible orien- 
tations of resulting spacers. The frequency of both orienta- 
tions is similar when a protospacer from which complemen- 
tary spacers are produced does not contain an AAG SAM, 
thus increasing a chance that a crRNA recognizing a pro- 
tospacer with a functional TIM is produced. In contrast, 
for protospacers with an AAG SAM (which is also a func- 
tional TIM), a single spacer orientation — the one that leads 
to functional, protective crRNA — is strongly favored. 

MATERIALS AND METHODS 

E. coli strains, plasmids and the primed adaptation experi- 
ment 

The E. coli KD263 strain is a derivative of BW401 19 strain 
described earlier (28). It contains the cas3 gene under the 



control of the /acUV5 promoter and the casABCDE12 
operon under the araBp8 promoter control. The KD263 
strains harbors a single genetically modified CRISPR cas- 
sette with two repeats and a single g8 spacer described ear- 
Her (14). KD263 was transformed with a pG8_ClT plas- 
mid, a derivative of the pT7blue cloning vector harboring 
a 209-bp fragment of the M13 bacteriophage DNA con- 
taining the g8 protospacer (28). The protospacer sequence 
harbors a C to T change at the position of +1 that renders 
CRISPR interference by the g8 spacer containing crRNA 
ineffective (14). KD263 cells transformed with pG8_ClT 
were grown overnight at 37°C in Luria-Bertani (LB) broth 
supplemented with 100 |xg/ml ampicillin. Aliquots of the 
culture were diluted 200-fold into six individual tubes with 
fresh LB broth without ampicillin and supplemented with 
IPTG (isopropyl (3-D-l thiogalactopyranoside) and ara- 
binose to the final concentration 1 mM each. The cul- 
tures were grown at 37°C overnight. The six individual cul- 
tures were mixed and genomic DNA was isolated from the 
pooled cultures. The cells were lyzed by 2-min incubation 
with 1 mg/ml lysozyme and DNA was purified by phe- 
nol, phenol/chloroform, chloroform extractions followed 
by ethanol precipitation. CRISPR expansion was moni- 
tored by polymerase chain reaction (PCR) in 20 |xl reactions 
containing 20-50 ng genomic DNA with primers matched 
the CRISPR leader sequence and g8 spacer: Ec_LDR-F (5'- 
AAGGTTGGTGGGTTGTTTTTATGG-3') and M13g8 
(5'-GGATCGTCACCCTCAGCAGCG-3') using Phusion 
High-Fidelity DNA Polymerase (New England Biolabs). 
Six independent amplification reactions were pooled, PCR 
products corresponding to expanded CRISPR cassette were 
gel purified using QIAquick Gel Extraction Kit (QIAGEN) 
and sequenced with MySeq Illumina System at Moscow 
State University Genomics facility as described (31). 

Data processing 

Raw sequencing data were analyzed using ShortRead and 
BioStrings (34) packages. Illumina-sequencing reads were 
filtered for quality scores of >20 and reads containing two 
repeats (with up to two mismatches) were selected. Reads 
that contained 33-bp sequences between two CRISPR re- 
peats were next selected. The 33-bp segments were consid- 
ered spacers. Spacers were next mapped on the pG8_ClT 
plasmid with no mismatches allowed. R scripts and their 
package ggplot2 (35) were used for spacers statistics and 
graphical representation. Logo construction was done with 
http://weblogo.berkeley.edu (36). 

RESULTS 

Experimental primed adaptation set up 

The E. coli strain KD263 with cas genes fused to inducible 
promoters and a CRISPR cassette containing two repeats 
and an M13-phage-derived g8 spacer was transformed with 
a pT7blue-based plasmid containing a fragment of the Ml 3 
phage with g8 protospacer. The protospacer contained a 
point mutation that introduced a single mismatch between 
the target DNA and g8 crRNA at the + 1 position (see be- 
low). Elsewhere, we show that such a mismatch renders 
CRISPR interference inactive, however, strongly stimulates 
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primed adaptation from DNA located in cis to the proto- 
spacer (28). Expression of cas genes was next induced and 
cells growth was continued in the absence of antibiotics, 
thus allowing cells that lost the plasmid to survive. PCR 
analysis with appropriate primers revealed that more than 
50% of cells had expanded their CRISPR cassettes after 
overnight growth without antibiotics. The growth kinetics 
of plasmid-bearing KD263 was not affected by induction 
of cas genes expression (data not shown). At the conditions 
of our experiment there is no strong selective pressure for 
the cells to acquire an interference-capable spacer (and no 
penalty for acquiring a non-functional spacer). While cells 
that acquired a functional spacer may be able to lose the 
plasmid faster and thus get a small selective advantage dur- 
ing growth in the absence of antibiotic, cells that acquired 
a non-functional spacer also survive and propagate. Thus, 
the distribution of spacers (and protospacer choice) in our 
experiment should largely reflect the preferences of the ac- 
quisition machinery only. 

A DNA band corresponding to CRISPR cassette ex- 
panded by a single repeat-spacer unit was purified and sub- 
jected to deep-coverage high-throughput sequencing and 
analysis. A flow chart describing the general outcome of this 
analysis is presented in Figure 1. A total of 1 934 605 se- 
quences of newly acquired spacers (defined here as 33-bp 
DNA fragments separating CRISPR repeat sequences in 
Illumina reads) were obtained. 80.4% (1 555 829) of spac- 
ers were mapped with no mismatches to sequences (proto- 
spacers) from a plasmid used to transform the cells. Ninety 
seven percent of protospacers had an AAG SAM (Figure 
1). 93.3% of spacers were mapped to protospacers located 
in the plasmid DNA strand that is non-targeted by the g8 cr- 
RNA. The strong preference for an AAG SAM and a bias 
toward the non-targeted strand are both the hallmarks of 
primed adaptation (28,31). 

The 1 555 829 spacers mapped to just 1584 unique proto- 
spacers (Figure 1). Among the unique protospacers just 7% 
contained an AAG SAM. Further, the unique protospacers 
were equally distributed between both strands of plasmid 
DNA (51.7% unique protospacers located in the targeted 
strand. Figure 1). 

Identification of conserved sequence motifs in protospacers 
used during primed adaptation 

Analysis depicted in Figure 1 demonstrates that trends re- 
vealed when unique protospacers are analyzed and different 
from those revealed when the frequency of resulting spac- 
ers is considered. This occurs because some protospacers 
contribute a disproportionately large amount of spacers. 
At our coverage, spacers from protospacers with an AAG 
SAM were selected, on an average, 13 483 times (the ac- 
tual spread of this value was very wide, ranging from 20 
to 217 955 times, median = 2352). The different frequen- 
cies of protospacer choice are unlikely to be an artifact of 
PCR amplification and library construction since a good 
correlation in relative spacer frequencies is observed in in- 
dependent experiments ((31) and data not shown). There- 
fore, the frequency of spacer occurrence during sequencing 
is a good measure of protospacer choice efficiency. Spacers 
from protospacers without an AAG SAM were selected on 



average less often (from 1 to 5639 times, median = 3). De- 
spite the fact that spacers acquired from protospacers with 
AAG SAM were much more frequent, such protospacers 
constituted only 7% of unique protospacers in our data set. 

We constructed a LOGO for all 1 584 unique protospac- 
ers and adjacent five upstream and downstream nucleotides. 
The 'upstream' and 'downstream' directions were set by 
the orientation of the corresponding spacer in the CRISPR 
cassette. In E. coli, 33-bp DNA fragments are inserted in 
CRISPR cassette during spacer acquisition (33). When con- 
sidering a target DNA sequence that serves as a donor of a 
spacer, we number SAM/TIM residues from —2 to 0 (i.e. 
A-2, A-i and G° for an AAG SAM). G° is also the first 
residue of the protospacer (28,29,33). Subsequent residues 
of the protospacer are numbered consecutively up to po- 
sition +32. The numbering then continues (+33, +34, etc.) 
downstream of the protospacer. The results of our LOGO 
analysis are presented in Figure 2A. Though during LOGO 
construction the frequency of protospacer use is not taken 
into account, the upstream AAG motif was observed, due 
to the 7% of unique protospacers with AAG SAM (above). 
A guanine at position 0 was most strongly conserved (re- 
call that in LOGO representation, the overall height of the 
stack indicates the sequence conservation at position being 
investigated, while the height of symbols within the stack in- 
dicates the relative frequency of each amino or nucleic acid 
at this position (36)). No preferences for bases within the 
individual positions throughout the protospacer positions 
+ 1 to +31 were observed. Surprisingly, a weak, but clearly 
detectable preference for a C at position +32 was detected. 

In principle, a preference for a C at position +32 may 
signal a requirement for a sequence that functions together 
with the upstream SAM, or, alternatively, may be indicative 
of a SAM-independent protospacer choice signal. To select 
between these possibilities we constructed a LOGO only for 
those protospacers that contained a G at position 0. As can 
be seen from Figure 2B, the preference for a C at position 
+32 was decreased in this group of protospacers. A LOGO 
for a reciprocal set of protospacers, i.e. the ones that do not 
contain a G at 0, showed an increased preference for a C at 
position +32 (Figure 2C). Moreover, a weak preference for 
T at positions +33 and +34 became evident. Complemen- 
tary analysis considered protospacers with or without a C 
at position +32. The former group of spacers had a very 
weak preference for a G at position 0 and no preference 
for positions —1 and —2 (Figure 2D). In the later group of 
protospacers, a preference for A"-A"'G° became stronger 
than in the total protospacer set (compare Figure 2A and 
E). The results thus show that (i) cytosine is preferentially 
found as the last residue of protospacers (position +32) and 
(ii) the presence of a G originating from a consensus SAM 
sequence at position 0 decreases the likelihood of finding a 
C at position +32. The opposite is also true. 

Most spacers acquired during primed adaptation are inserted 
in both orientations 

The numerical predominance of non-AAG protospacers 
over AAG protospacers in our unique protospacer set 
may be partially explained by interdependencies in spacer 
choice. For example, up to four conjugated protospacers 
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Figure 1. A workflow and statistics of analysis of high-throughput sequencing data on acquisition of plasmid-derived spacers into Escherichia coli CRISPR 
cassette. See text for details. Results for all spacers (left) and unique (right) spacers are summarized. 



linked to non-AAG motifs can arise when spacers are se- 
lected 'incorrectly', one or two nucleotides downstream or 
upstream from the predominant 'correct' site determined by 
the location of consensus AAG SAM (31). To reveal inter- 
dependencies of protospacer choice, overlapping or neigh- 
boring protospacers that started in the region extending 
from position —20 to position +49 of every unique pro- 
tospacer present in our data set were identified on both 
strands. Every time a 0 position of an overlapping or 
neighboring protospacer mapped to a position within the 
— 20/+49 interval of protospacer being considered, a +1 
score was added to this position. In Figure 3A, a his- 
togram showing total scores for each position surveyed for 
all unique protospacers from our collection is shown. For 
obvious reasons our procedure results in a score of 1584, 
which equals a total number of unique protospacers, for 
position 0. For most other positions, the score is uniform 
and averages about 400. The only clear exception is position 
+32, where a clear excess of overlapping opposite-strand 
protospacers starting at this position is observed. A closer 
inspection also reveals that for both 0 and +32 positions, 
the immediately adjacent positions (—1,-2 and +1, +2 and 
+30, +31 and +33, +34, correspondingly) at the same strand 
tend to be used more often as overlapping protospacer start 
points, a likely indication of imprecise protospacer selection 



or 'slippage' following the initial recognition of AAG SAM 
sequence mentioned above. 

The clear overrepresentation of overlapping protospacers 
beginning at position +32 suggests that self-complementary 
spacers exist in our collection. This notion is also supported 
by the results of the LOGO analysis, since a G at position 0 
and a C and position +32 tend to be present in different sub- 
populations of protospacers (the same is also true for pro- 
tospacers with AAG SAM and the complementary down- 
stream CTT motif. Figure 2A and C). In other words, the 
CTT motif characteristic of some protospacers (Figure 2C) 
may arise during the acquisition of a spacer derived from 
a protospacer with an AAG SAM in an inverted orienta- 
tion. Indeed, a direct search revealed that 65% of unique 
protospacers in our data set formed self-complementary 
pairs. The remaining 35% of unique protospacers for which 
no pairs could be found constituted just 0.3% of the total 
number of spacers, i.e. were very rare and presumably the 
corresponding protospacers were poor spacer donors. The 
median of quantities of spacers with and without a com- 
plementary mate was 6 and 1, correspondingly. In princi- 
ple, both protospacers within a pair in our data set could 
have been selected independently. If true, this would make 
the observed pairs essentially random. However, this notion 
does not match the fact that only 1584 out of 6190 possible 
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Figure 2. Local context of protospacers used during primed adaptation. Protospacer region LOGOs built for all protospacer in the data set (A), for 
protospacers with a G at position 0 (B), for protospacers lacking a G at this position (C), for protospacers with a C at position +32 (D) and for protospacers 
lacking a C at this position (E). 
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unique plasmid-derived spacers (note that the donor plas- 
mid is 3095-bp long) are observed in our data set. Therefore, 
the appearance of paired spacers originating from the same 
protospacer reflects an inherent property of the acquisition 
mechanism. 

To further investigate spacer pairs present in our data 
set, the more common spacer was identified for each pair. 
The more common spacers were next sorted according to 



the number of times they occurred in our data set. A plot 
showing the distribution of observed numbers of spacers 
in the sorted array is shown in Figure 3B (in the figure, 
the more common spacers from each pair are presented 
above the horizontal axis). The corresponding less common 
pair-mated spacers were not arrayed. Instead, their num- 
bers were plotted below the horizontal axis under the corre- 
sponding more common pair mate. As expected (31), spac- 



Nucleic Acids Research, 2014, Vol. 42, No. 9 5913 



ers corresponding to protospacers with an AAG SAM (in- 
dicated in red) were the predominant ones — both overall 
and within a pair. These spacers should generate functional 
crRNA capable of interfering with target DNA through 
recognition of an AAG TIM. The more rare complemen- 
tary spacers could only result in functional crRNA when 
a downstream interference-capable TIM was present, pre- 
sumably an event determined by chance. For pairs of spac- 
ers derived from AAG protospacers, the median of ratios 
between 'correct', functional, and 'incorrect' and, therefore 
likely non-functional complementary spacers equaled 194 
(Figure 3C). For non-AAG protospacers (indicated in blue 
and green in Figure 3B), the median value of ratios be- 
tween more and less common spacers within a pair equaled 
4. These values are different with a /"-value of less than 
2.2e— 16 according to Wilcoxon test. Spacers that were se- 
lected by imprecise choice/slippage following AAG SAM 
recognition (indicated in blue in Figure 3B) were analyzed 
as a separate group. The median of ratios between more 
and less common spacers within pair from this 'near- AAG' 
group equaled 9 (Figure 3C), which is significantly differ- 
ent from the value obtained for AAG-protospacers (P- value 
2e-16). 

The Qimron lab has recently reported the results of anal- 
ysis of high-throughput sequencing data of acquired E. coli 
spacers that suggested the presence of a downstream spacer 
'AAM' motif within frequently used protospacers (27). This 
motif could not be revealed by LOGO analysis but was iden- 
tified as a difference of nucleotide frequency at each proto- 
spacer position among frequently and rarely acquired spac- 
ers groups. It was therefore proposed that an AAG SAM 
and the downstream motif jointly affect the efficiency of 
spacer acquisition. No downstream motif could be iden- 
tified in highly used protospacers present in our data set. 
Thus, the preferential usage of protospacers in our experi- 
mental system does not appear to require any additional sig- 
nals other than an AAG PAM. It should be noted, however, 
that the downstream spacer motif was observed only among 
host-derived spacers but not from plasmid-derived spacers. 
Cells that acquired host-derived spacers should be purged 
from the population at our conditions due to auto-immune 
response (26) and the corresponding spacers should thus 
evade detection. 

Spacer choice during unprimed adaptation 

Overall, our analysis suggests that cells containing com- 
plementary spacers that must have originated from the 
same plasmid protospacer can oftentimes be found in a 
population that underwent primed adaptation. The pres- 
ence of an AAG SAM strongly increases the likelihood 
that a spacer is inserted in an orientation that generates 
interference-capable crRNA. This effect appears to be un- 
related to the actual frequency of protospacer use, since in- 
creased usage of 'near- AAG' protospacers does not lead to 
preferential orientation of spacers derived from them. The 
above trends have been revealed when analyzing spacers 
acquired during primed adaptation, which, in addition to 
Casl/Cas2 proteins that appear to constitute the adapta- 
tion machinery, requires a partial match between crRNA 
and a priming protospacer, intact Cascade complex, and 



the Cas3 endonuclease/helicase (28). Recently, a large data 
set of plasmid-derived spacers acquired in the course of un- 
primed E. coli adaptation caused by Casl/Cas2 overpro- 
duction was obtained (27). To determine if complemen- 
tary spacers are generated during unprimed adaptation, we 
analyzed the available high-throughput sequencing data of 
Yosef et al. (27). This data set includes 5336 unique spac- 
ers from a total of 9422 possible plasmid-derived spac- 
ers. Only 122 protospacers (2.3% of the total) contained 
an AAG SAM. Together, spacers originating from AAG- 
protospacers constituted 36.7% of all spacers sequenced, 
reflecting the previously noticed decreased preference for 
AAG during unprimed adaptation (recall that spacers origi- 
nating from AAG-protospacers constituted 97% of all spac- 
ers sequenced during the primed adaptation experiment, 
above). We found that 94% of spacers from unprimed adap- 
tation data set have a complementary counterpart. To- 
gether, these paired spacers account for 86% of donor pro- 
tospacers. As in the case of primed adaptation, pairs aris- 
ing from AAG-protospacers were the most frequent and 
within such pairs spacers with functional orientation were 
more common (Figure 3A). The median value of ratios 
between the more and less common complementary spac- 
ers in AAG-protospacer-derived pairs was 128 (Figure 4B). 
For spacer pairs derived from protospacers without AAG, 
this value was 2.8 (Figure 4B). Both values are similar to 
those obtained for corresponding groups of spacers from 
the primed adaptation experiment (Figure 3C). We there- 
fore conclude that generation of complementary spacers 
from the same protospacer is a common feature of both 
primed and non-primed adaptation. 

DISCUSSION 

Our results reveal that when an E. coli culture undergoes 
CRISPR adaptation, the population of cells that results 
invariably contains clones with complementary CRISPR 
spacers that are generated from the same protospacer. Sim- 
ple statistical analysis shows that pair-mated spacers that 
account for most newly acquired spacers are not generated 
through independent recognition of the same protospacer 
sequence on both strands of plasmid DNA but are rather 
a result of two possible outcomes of insertion of material 
from once recognized protospacer into a CRISPR cassette. 
This protospacer 'flippage' is a characteristic feature of both 
primed and unprimed adaptation. In the absence of consen- 
sus AAG SAM, the ratio between the two complementary 
spacers that originate from the same protospacer is about 
1:4 or less (Figure 3C and Figure 4B). This bias is unrelated 
to strand polarity during primed spacer acquisition (data 
not shown) and is likely determined by sequence preferences 
of Casl, Cas2 and their putative partners encoded by the 
bacterial genome to either CRISPR repeat or protospacer 
intermediate en route to insertion in the cassette (or both). 
When a protospacer is preceded by an AAG, which is both a 
preferred SAM during selection of protospacers for spacer 
acquisition and a functional TIM for interference, the pref- 
erence for a pair-mated spacer that results in crRNAs capa- 
ble of target interference is increased 30-fold or more. This 
bias is observed during both primed and unprimed adapta- 
tion. It therefore follows that preferential 'correct' orienta- 
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Figure 4. The interdependence of protospacers choice during unprimed CRISPR adaptation. (A) Self-complementary spacer pairs were ranked according 
to the quantity of the more frequent spacer in the pair (plotted above the horizontal axis). The corresponding less common spacers for each pair are plotted 
below the axis. Spacers from protospacers with an AAG SAM are highlighted in red. Spacers from protospacers with a non-AAG SAM are shown in green. 
(B) A box plot of observed ratios for pair-mated spacers originating from AAG and non-AAG protospacers. 



tion of AAG-protospacer-derived spacers is determined by 
Casl /Cas2 and their putative non-Cas partners, but not by 
other Cas proteins. This, however, creates an interesting co- 
nundrum, since only the protospacer sequence and the last 
G of SAM become inserted into CRISPR cassette. In other 
words, if a hypothetical intermediate of the adaptation re- 
action corresponds to material that is actually inserted in 
CRISPR cassette, there does not seem to be a way to distin- 
guish between intermediates derived from AAG- and XXG- 
protospacers. Yet the two types of protospacers are clearly 
different with respect to their ability to be preferentially in- 
serted in only one orientation. This apparent paradox may 
be explained by assuming that spacer acquisition machinery 
assumes a particular conformation when adopting spacers 
from AAG-protospacers. For example, the adaptation ma- 
chinery may be able to effectively recognize a protospacer 
with an AAG SAM only when approaching it from one di- 
rection and this directionality is somehow maintained dur- 



ing the generation of new spacer. The situation may be for- 
mally similar to that described for some site-specific recom- 
bination systems where directionality is maintained over 
long distances and even for DNA sequences located in trans 
(37). One difficulty with this scenario, however, is that for 
numerous spacers that arise upon 'slippage' of the adapta- 
tion machinery after the recognition of the AAG SAM, the 
ability to maintain the preferred orientation of spacers is 
lost (Figure 3C). An alternative and more radical mecha- 
nism that can account for the maintenance of spacer ori- 
entation bias for AAG protospacers is shown in Figure 5. 
According to this model, an intermediate of spacer inser- 
tion reaction contains the entire AAG SAM, with the two 
adenine residues contributing to spacer orientation but be- 
ing removed at some later stages of the process (Figure 5). 

The reciprocal orientations of CRISPR spacers has been 
reported for Streptococcus agalactiae (38). Mick et al. (39) 
analyzed CRISPR spacers from an intestinal metagenome. 
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Figure 5. A possible model of CRISPR-Cas adaptation. The models schematically present the likely sequence of events during adaptation of spacers 
from AAG (left) and non-AAG protospacers (right). Both kinds of protospacers are schematically shown at the top of the figure as part of foreign DNA 
(olive green). A fragment that gets inserted in the cassette is shown in orange. AAG protospacers are much more efficient spacer donors compared to 
non-AAG protospacers (indicated by the width of black downward arrows). During the adaptation, an intermediate of the spacer acquisition reaction, 
here depicted as a double-stranded DNA fragment, is formed. In principle, this intermediate may also be single-stranded and/or copied from target DNA. 
During adaptation from AAG protospacers, an orientation of intermediate insertion in the cassette that results in interference-capable crRNA is strongly 
favored. In the course of adaptation from non-AAG protospacers both orientations are likely. 



Among the 4171 unique spacers present in their data set, 
there are eight complementary pairs. While the source of 
these spacers cannot be determined one spacer pair matches 
a spacer from a CRISPR cassette from Bifidobacterium 
longum. Similarly, acquisition of complementary spacers 
was reported in Sulfolobus (40). Thus, it is possible that gen- 
eration of two oppositely oriented spacers from the same 
protospacer is a common feature of the CRISPR adapta- 
tion process rather than a specific feature of the E. coli sys- 
tem. Such a dual mode of spacer insertion may have an 
adaptive value. The two possible outcomes increase 2-fold 
the number of unique spacers and thus increase the like- 
lihood that at least one of crRNAs resulting from spacers 
derived from non-AAG protospacers will support interfer- 
ence. The effect may not be significant for primed adap- 
tation, where most spacers are selected from protospacers 
with AAG SAM but can become important in the case 
of unprimed adaptation, which is not only '^50 times less 
frequent than primed adaptation (28), but is also less spe- 
cific (63% of newly acquired spacers derived from non-AAG 
protospacers and thus likely leading to non-functional cr- 
RNA (27)). Conversely, increased bias towards 'correct' ori- 
entation for spacers derived from AAG protospacers should 
also be biologically relevant since this orientation is guaran- 
teed to result in crRNA capable of interference. 
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